FOIL-D: Efficiently Scaling FOIL for Multi-relational Data Mining of Large Datasets

نویسندگان

  • Joseph Bockhorst
  • Irene M. Ong
چکیده

Multi-relational rule mining is important for knowledge discovery in relational databases as it allows for discovery of patterns involving multiple relational tables. Inductive logic programming (ILP) techniques have had considerable success on a variety of multi-relational rule mining tasks, however, most ILP systems do not scale to very large datasets. In this paper we present two extensions to a popular ILP system, FOIL, that improve its scalability. (i) We show how to interface FOIL directly to a relational database management system. This enables FOIL to run on data sets that previously had been out of its scope. (ii) We describe estimation methods, based on histograms, that significantly decrease the computational cost of learning a set of rules. We present experimental results that indicate that on a set of standard ILP datasets, the rule sets learned using our extensions are equivalent to those learned with standard FOIL but at considerably less cost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An approach to mining the multi-relational imbalanced database

The class imbalance problem is an important issue in classification of Data mining. For example, in the applications of fraudulent telephone calls, telecommunications management, and rare diagnoses, users would be more interested in the minority than the majority. Although there are many proposed algorithms to solve the imbalanced problem, they are unsuitable to be directly applied on a multire...

متن کامل

Towards Structural Logistic Regression: Combining Relational and Statistical Learning

Inductive logic programming (ILP) techniques are useful for analyzing data in multi-table relational databases. Learned rules can potentially discover relationships that are not obvious in "flattened" data. Statistical learners, on the other hand, are generally not constructed to search relational data; they expect to be presented with a single table containing a set of feature candidates. Howe...

متن کامل

Three Companions for Rst Order Data Mining

Three companion systems, Claudien, ICL and Tilde, are presented. They use a common representation for examples and hypotheses: each example is represented by a relational database. This contrasts with the classical inductive logic programming systems such as Progol and Foil. It is argued that this representation is closer to attribute value learning and hence more natural. Furthermore, the thre...

متن کامل

Three Companions for Data Mining in Rst Order Logic

Three companion systems, Claudien, ICL and Tilde, are presented. They use a common representation for examples and hypotheses: each example is represented by a relational database. This contrasts with the classical inductive logic programming systems such as Progol and Foil. It is argued that this representation is closer to attribute value learning and hence more natural. Furthermore, the thre...

متن کامل

Comparison of Three Parallel Implementations of an Induction Algorithm

Recently, researchers have tried to apply ILP to KDD because ILP enlarges the applicability of Machine Learning to cover KDD and Data Mining: it enables them to learn from multiple relational tables. Many scienti c discovery systems are motivated from the desire to deal with larger databases. However the larger the databases are, the more computational power we need. Parallel computing is a pos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004